## Warning in if (dim(subset(enrichVsPp.noncoding, Disease %in% bloodRelatedTraits)
## [1]) != : the condition has length > 1 and only the first element will be used
## null device 
##           1

Summary of ABC for all traits

This report shows summary statistics produced by the disease variant validation pipeline for predictions made by ABC.
It seeks to give you a better understanding of how your prediction method has performed across all common disease traits, looking mainly at enrichment of traits in enhancers (including and excluding promoter regions), the # of traits overlapping enhancers (including and excluding promoter regions) as well as how well your prediction method links non-coding disease variants to their respective genes.
The following report is done across all traits.


Fig1. ABC maps connecting fine-mapped variants to enhancer, genes and celltypes. a, Enrichment of fine-mapped variants (PIP>10%) in ABC enhancers. b, Fraction of fine-mapped variants (PIP>10%) overlapping ABC enhancers. c, Number of enriched celltypes of fine-mapped variants (PIP>10%) d, Precision–recall plot of connections between noncoding credible sets and known genes. e, Cumulative density plot of enrichment across all traits.

Below are interactive plots to allow further exploration of how your prediction method has performed across all common disease traits.
Summary statistics like (min/max/median/quantiles) are reported here, as well as outlier points.
The first question we ask is :
## To what extent are GWAS variants enriched in predicted enhancers? ### Enrichment of GWAS variants for enhancers (including promoter regions) and (excluding promoter regions) These barplots calculates the enrichment for fine-mapped variants (PIP >= 10%) in ABC enhancers (left) and enhancers without promoter regions (promoter regions as defined by RefSeq) (right) across all biosamples. Variants were filtered to only include non-coding distal elements.

Fig2. Enrichment of fine-mapped variants (PIP >= 10%) in enhancers across all biosamples. Box plots show median (middle line) and interquartile range (boxes) and whiskers show observations less than or equal to quartiles ± 1.5× the interquartile range. Outlier points are included in black.

Fig2. Enrichment of fine-mapped variants (PIP >= 10%) in enhancers across all biosamples. Box plots show median (middle line) and interquartile range (boxes) and whiskers show observations less than or equal to quartiles ± 1.5× the interquartile range. Outlier points are included in black.

Enrichment tables

Tables with values used to derive enrichment values and # fraction overlap for enhancers (including promoter regions) and enhancers (excluding promoter regions).

ABC

To what extent do GWAS variants overlap predicted enhancers?

Variant overlap of GWAS variants for enhancers (including promoter regions) (left) and enhancers (excluding promoter regions) (right)

We plot the fraction overlap of variants (PIP >= 10%) that overlap ABC enhancers with and without promoter regions (promoter regions as defined by RefSeq) across all biosamples.

Fig3. Fraction overlap of fine-mapped variants (PIP >= 10%) in enhancers across all biosamples. Box plots show median (middle line) and interquartile range (boxes) and whiskers show observations less than or equal to quartiles ± 1.5× the interquartile range. Outliers are shown in black.

Fig3. Fraction overlap of fine-mapped variants (PIP >= 10%) in enhancers across all biosamples. Box plots show median (middle line) and interquartile range (boxes) and whiskers show observations less than or equal to quartiles ± 1.5× the interquartile range. Outliers are shown in black.

How many enriched celltypes are called across all traits?

Fig 4. Number of enriched celltypes (p < 0.05) when calculating for enrichment of GWAS variants / all variants across all traits.

Fig 4. Number of enriched celltypes (p < 0.05) when calculating for enrichment of GWAS variants / all variants across all traits.

How does the fraction of noncoding variants that overlap a given ABC enhancer in any biosample change given the PIP threshold?

Variants were filtered to include only distal non-coding. Blue represents IBD. Red represents 12 other blood traits. Black represents the average across all traits and Gray represents the remaining 57 other traits.

Fig5. Fraction of noncoding variants above a given PIP threshold that overlap an ABC enhancer in any biosample. Black line, weighted average across traits. Traces are shown for PIP thresholds above which there are at least five variants. Dashed line, fraction of all common noncoding variants that overlap ABC enhancers.

How well do predictions connect noncoding credible sets to known disease genes?

These precision-recall curves seek to plot the performance of choosing the gene with the best score per locus.
Precision = fraction of identified genes corresponding to the list of known genes that affect all GWAS traits
Recall = fraction of known genes that were identified.
Fig6. Precision–recall plot of connections between noncoding credible sets and known genes, where recall is the fraction of credible sets for which the known
          gene is identified (sensitivity) and precision is the fraction of predicted
          genes corresponding to known genes (positive predictive value). As baseline, the heuristic of assigning each GWAS credible
          set to the closest gene—a method that is widely used to annotate GWAS
          loc. Simiarly, a similar approach — which selects the closest
          transcription start site (TSS) was also added.

Fig6. Precision–recall plot of connections between noncoding credible sets and known genes, where recall is the fraction of credible sets for which the known gene is identified (sensitivity) and precision is the fraction of predicted genes corresponding to known genes (positive predictive value). As baseline, the heuristic of assigning each GWAS credible set to the closest gene—a method that is widely used to annotate GWAS loc. Simiarly, a similar approach — which selects the closest transcription start site (TSS) was also added.

Methods

GWAS traits and loci
Summary statistics for IBD, Crohn’s disease and ulcerative colitis (European ancestry only)51 from https://www.ibdgenetics.org/downloads.html. We obtained fine-mapping results and summary statistics for 71 other traits based on an unpublished analysis ( Jacob Ul, M. Kanai and Hillary Finucane, unpublished data) that analysed data from the UK Biobank. (fine-mapping data are available at https://www.finucanelab.org/data). In this analysis, up to 361,194 individuals of white British ancestry with available phenotypes and variants with INFO > 0.8, minor allele frequency > 0.01%, and Hardy–Weinberg equilibrium P > 1 × 10−10 were included in the GWAS. For all traits, except where specified, we considered only the ‘noncoding credible sets’—that is, those that did not contain any variant in a coding sequence or within 10 bp of a splice site annotated in the RefGene database (downloaded from UCSC Genome Browser on 24 June 2017).

Defining enriched biosamples for each trait
For a given trait, we intersected variants with PIP ≥ 10% in noncoding credible sets with ABC enhancers (or other genomic annotations). For each biosample, we calculated a P value using a binomial test comparing the fraction at which PIP ≥ 10% variants overlapped ABC enhancers with the fraction at which all common variants overlap ABC enhancers in that cell type. We calculated the latter using common variants in the 1000 Genomes Projects as described in the ‘Stratified linkage disequilibrium score regression’ section. For each trait, we defined a biosample as significantly enriched for that trait if the Bonferroni-corrected binomial P value was <0.001.

Comparison of enrichment of fine-mapped variants in enhancer regions and other enhancer-gene predictions
We compared the enrichment of fine-mapped variants in ABC enhancers and other enhancer definitions. We analysed each of the previous studies from Fig 1d (summary plot) reporting cell-type specific enhancer-gene predictions. For each of the methods below, we downloaded previous predictions of enhancer–gene links and assessed their ability to identify disease-associated genes. For this analysis, we used the predictions from each method to overlap fine-mapped variants (PIP ≥ 10%) with enhancers in any cell type and assigned variants to the predicted gene(s).

Calculating the number of enriched celltypes (p < 0.05)
Number of enriched celltypes where the enrichment of fine-mapped variants (PIP >= 10%) in enhancers are considered significant (p < 0.05) across all traits.

Enhancer-gene correlation (Activity-By-Contact)
DNase accessibility and acetylation of H3K27 are commonly used to identify enhancer elements, and are predictive of the expression of nearby genes and enhancer activity in plasmid-based reporter assays. The geometric mean of DNase-seq and H3K27ac ChIP–seq signals was used to calculate Activity. Contact component of the ABC score for E–G pairs from Hi-C data (when available) or the Average Hi-C data across 10 CellTypes, which was proven to an accurate substitute when celltype specific Hi-C data is not available. To calculate the relative effect of each element to the expression of a gene, we normalized the Activity by Contact of one element for a given gene to the sum of the Activity by Contact of other nearby elements. We included all elements within 5 Mb of the gene’s promoter in this calculation. Reference: https://www.nature.com/articles/s41588-019-0538-0

Enhancer-gene correlation (ChromHMM2017)
Gene expression was previously correlated with five active chromatin marks (H3K27ac, H3K9ac, H3K4me1, H3K4me2 and DNase I hypersensitivity) across 56 biosamples, and these correlation links were then used to make predictions for the predicted enhancers (regions with the ‘7Enh’ ChromHMM state) in 127 biosamples from the Roadmap Epigenome Atlas. We downloaded these predictions from www.biolchem.ucla.edu/labs/ernst/roadmaplinking and made predictions using the confidence score. Reference: https://www.nature.com/articles/nprot.2017.124


Enhancer-gene correlation (EpiMap)
Gene expression was previously correlated with five active chromatin marks (H3K27ac, H3K9ac, H3K4me1, H3K4me2 and H3K4me3) across 304 biosamples. A negative set of correlations for each enhancer was computed using random genes in a different chromosome. We predicted links for each biosample and ChromHMM enhancer state separately (states E7, E8, E9, E10, E11 and E15). Predictions were made by training an XGBoost classifier on the positive set of all valid links against their paired negative links, using precomputed correlations and distance to the transcription start site as features, and keeping all links with a probability above 5/7. We downloaded these predictions from https://personal.broadinstitute.org/cboix/epimap/links/. Reference: https://www.biorxiv.org/content/10.1101/810291v2